47 research outputs found

    Tupleware: Redefining Modern Analytics

    Full text link
    There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems

    Partitioning vs. Replication for Token-Based Commodity

    Get PDF
    The proliferation of e-commerce has enabled a new set of applications that allow globally distributed purchasing of commodities such as books, CDs, travel tickets, etc., over the Internet. These commodities can be represented on line by tokens, which can be distributed among servers to enhance the performance and availability of such applications. There are two main approaches for distributing such tokens ? replication and partitioning. Token replication requires expensive distributed synchronization protocols to provide data consistency, and is subject to both high latency and blocking in case of network partitions. On the other hand, token partitioning allows many transactions to execute locally without any global synchronization, which results in low latency and immunity against network partitions. In this paper, we examine the Data-Value Partitioning (DVP) approach to token-based commodity distribution. We propose novel DVP strategies that vary in the way they redistribute tokens among the servers of the system. Using a detailed simulation model and real Internet message traces, we investigate the performance of our DVP strategies by comparing them against a previously proposed scheme, Generalized Site Escrow (GSE), which is based on replication and escrow transactions. Our experiments demonstrate that, for the types of applications and environment we address, replication-based approaches are neither necessary nor desirable, as they inherently require quorum synchronization to maintain consistency. We show that DVP, primarily due to its ability to provide high server autonomy, performs favorably in all cases studied. (Also cross-referenced as UMIACS-TR-2000-6

    A Security Infrastructure for Mobile Transactional Systems

    Get PDF
    In this paper, we present an infrastructure for providing secure transactional replication support for peer-to-peer, decentralized databases. We first describe how to effectively provide protection against external threats, malicious actions by servers not authorized to access data, using conventional cryp-tography-based mechanisms. We then classify and present algorithms that provide protection against internal threats, malicious actions by authenticated servers that misrepresent protocol-specific infor-mation. Our approach to handling internal threats uses both cryptographic techniques and modifica-tions to the update commit criteria. The techniques we propose are unique in that they not only enable a tradeoff between performance and the degree of tolerance to malicious servers, but also allow for indi-vidual servers to support non-uniform degrees of tolerance without adversely affecting the performance of the rest of the system. We investigate the cost of our security mechanisms in the context of Deno: a prototype object replica-tion system designed for use in mobile and weakly-connected environments. Experimental results reveal that protecting against internal threats comes at a cost, but the marginal cost for protecting against larger cliques of malicious insiders is generally low. Furthermore, comparison with a decentralized Read-One Write-All protocol shows that our approach performs significantly better under various workloads. (Also cross-referenced as UMIACS-TR-2000-59

    S-Store: Streaming Meets Transaction Processing

    Get PDF
    Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system. We chose to build S-Store as an extension of H-Store, an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already supports, and we can concentrate on the additional implementation features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm, and show how S-Store matches and sometimes exceeds their performance while providing stronger transactional guarantees

    An End-to-end Neural Natural Language Interface for Databases

    Full text link
    The ability to extract insights from new data sets is critical for decision making. Visual interactive tools play an important role in data exploration since they provide non-technical users with an effective way to visually compose queries and comprehend the results. Natural language has recently gained traction as an alternative query interface to databases with the potential to enable non-expert users to formulate complex questions and information needs efficiently and effectively. However, understanding natural language questions and translating them accurately to SQL is a challenging task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet made their way into practical tools and commercial products. In this paper, we present DBPal, a novel data exploration tool with a natural language interface. DBPal leverages recent advances in deep models to make query understanding more robust in the following ways: First, DBPal uses a deep model to translate natural language statements to SQL, making the translation process more robust to paraphrasing and other linguistic variations. Second, to support the users in phrasing questions without knowing the database schema and the query features, DBPal provides a learned auto-completion model that suggests partial query extensions to users during query formulation and thus helps to write complex queries

    S-Store: a streaming NewSQL system for big velocity applications

    Get PDF
    First-generation streaming systems did not pay much attention to state management via ACID transactions (e.g., [3, 4]). S-Store is a data management system that combines OLTP transactions with stream processing. To create S-Store, we begin with H-Store, a main-memory transaction processing engine, and add primitives to support streaming. This includes triggers and transaction workflows to implement push-based processing, windows to provide a way to bound the computation, and tables with hidden state to implement scoping for proper isolation. This demo explores the benefits of this approach by showing how a naïve implementation of our benchmarks using only H-Store can yield incorrect results. We also show that by exploiting push-based semantics and our implementation of triggers, we can achieve significant improvement in transaction throughput. We demo two modern applications: (i) leaderboard maintenance for a version of "American Idol", and (ii) a city-scale bicycle rental scenario

    Light-Weight Currency Management Mechanisms in Deno

    No full text
    This paper discusses the currency management mechanisms used in Deno, a replicated-object storage syste
    corecore